Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to dynamically refine knowledge from KGs for better recommendation. To address the above issues, we propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR). Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs; and perform dynamic knowledge reasoning conditioned on the dialogue context. Specifically, we denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs refactor. We propose a variational Bayesian method to approximate posterior distributions over dialogue-specific subgraphs, which not only leverages the dialogue corpus for restructuring missing entity relations but also dynamically selects knowledge based on the dialogue context. Finally, we infuse the dialogue-specific subgraphs to decode the recommendation and responses. We conduct experiments on two benchmark CRSs datasets. Experimental results confirm the effectiveness of our proposed method.
translated by 谷歌翻译
Pre-trained language models (LMs) store knowledge in their parameters and can generate informative responses when used in conversational systems. However, LMs suffer from the problem of "hallucination:" they may generate plausible-looking statements that are irrelevant or factually incorrect. To address this problem, we propose a contrastive learning scheme, named MixCL. A novel mixed contrastive objective is proposed to explicitly optimize the implicit knowledge elicitation process of LMs, and thus reduce their hallucination in conversations. We also examine negative sampling strategies of retrieved hard negatives and model-generated negatives. We conduct experiments on Wizard-of-Wikipedia, a public, open-domain knowledge-grounded dialogue benchmark, and assess the effectiveness of MixCL. MixCL effectively reduces the hallucination of LMs in conversations and achieves the highest performance among LM-based dialogue agents in terms of relevancy and factuality. We show that MixCL achieves comparable performance to state-of-the-art KB-based approaches while enjoying notable advantages in terms of efficiency and scalability.
translated by 谷歌翻译
Existing natural language understanding (NLU) models often rely on dataset biases rather than intended task-relevant features to achieve high performance on specific datasets. As a result, these models perform poorly on datasets outside the training distribution. Some recent studies address the above issue by reducing the weights of biased samples during the training process. However, these methods still encode biased latent features in representations and neglect the dynamic nature of bias, which hinders model prediction. We propose an NLU debiasing method, named debiasing contrastive learning (DCT), to simultaneously alleviate the above problems based on contrastive learning. We devise a debiasing positive sampling strategy to mitigate biased latent features by selecting the least similar biased positive samples. We also propose a dynamic negative sampling strategy to capture the dynamic influence of biases by employing a bias-only model to dynamically select the most similar biased negative samples. We conduct experiments on three NLU benchmark datasets. Experimental results show that DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance. We also verify that DCT can reduce biased latent features from the model's representations.
translated by 谷歌翻译
学习的推荐系统可能会无意间泄露有关其培训数据的信息,从而导致侵犯隐私行为。我们调查了推荐系统通过成员推理面临的隐私威胁。在这种攻击中,对手旨在推断用户的数据是否用于训练目标推荐人。为了实现这一目标,以前的工作使用了阴影推荐人来为攻击模型得出训练数据,然后通过计算用户历史互动和推荐项目之间的差异向量来预测成员资格。最先进的方法面临两个具有挑战性的问题:(1)由于阴影和目标推荐人之间的差距,攻击模型的培训数据偏见,并且(2)推荐人中的隐藏状态没有观察到,导致估计不准确差矢量。为了解决上述局限性,我们提出了针对推荐系统(DL-MIA)框架的成员推理攻击的偏见学习,该框架具有四个主要组件:(1)差异向量生成器,(2)分发式编码器,(3)重量估算器和(4)攻击模型。为了减轻推荐人之间的差距,设计了基于变异的自动编码器(VAE)的分解编码器,以识别推荐人不变和特定功能。为了减少估计偏差,我们设计了一个权重估计器,为每个差异向量分配了真实级别的得分,以指示估计精度。我们对三个现实世界数据集的一般推荐人和顺序推荐人评估了DL-MIA。实验结果表明,DL-MIA有效地减轻了同时减轻培训和估计的偏见,并实现了最先进的攻击性能。
translated by 谷歌翻译
以任务为导向的对话系统(TDSS)主要在离线设置或人类评估中评估。评估通常仅限于单转或非常耗时。作为替代方案,模拟用户行为的用户模拟器使我们能够考虑一组广泛的用户目标,以生成类似人类的对话以进行模拟评估。使用现有的用户模拟器来评估TDSS是具有挑战性的,因为用户模拟器主要旨在优化TDSS的对话策略,并且评估功能有限。此外,对用户模拟器的评估是一个开放的挑战。在这项工作中,我们提出了一个用于端到端TDS评估的隐喻用户模拟器,如果它在与系统的交互中模拟用户的类似思维,则定义模拟器是隐喻的。我们还提出了一个基于测试人员的评估框架,以生成变体,即具有不同功能的对话系统。我们的用户模拟器构建了一个隐喻的用户模型,该模型通过参考遇到新项目时的先验知识来帮助模拟器进行推理。我们通过检查模拟器与变体之间的模拟相互作用来估计模拟器的质量。我们的实验是使用三个TDS数据集进行的。与基于议程的模拟器和三个数据集上的SEQ2SEQ模型相比,隐喻用户模拟器与手动评估的一致性更好。我们的测试人员框架展示了效率,并且可以更好地概括和可扩展性,因为它可以适用于多个域中的对话和多个任务,例如对话建议和电子商务对话。
translated by 谷歌翻译
医疗对话系统(MDSS)旨在协助医生和患者一系列专业医疗服务,即诊断,咨询和治疗。但是,一站式MDS仍然是未开发的,因为:(1)没有数据集如此大规模对话包含多种医疗服务和细粒度的医疗标签(即,意图,插槽,值); (2)没有模型已经根据统一框架中的多服务对话解决了MDS。在这项工作中,我们首先建立一个多域多次服务医学对话(M ^ 2-Meddialog)数据集,其中包含医生和患者的1,557种对话,涵盖276种疾病,2,468种医学实体和3种医疗服务专业。据我们所知,它是唯一包括多种医疗服务和细粒度医疗标签的医疗对话数据集。然后,我们将一站式MDS制定为序列到序列生成问题。我们分别统一MDS,具有因果语言建模和条件因果语言建模。具体而言,我们采用了几种预磨料模型(即,Bert-WWM,BERT-MED,GPT2和MT5)及其变体,以在M ^ 2-MedDialog数据集上获取基准。我们还提出了伪标签和自然扰动方法来扩展M2-MedDialog数据集,并增强最先进的预磨损模型。我们展示了到目前为止通过对M2-MEDDIALOG的大量实验来实现的结果。我们释放DataSet,代码以及评估脚本,以促进在这方面的未来研究。
translated by 谷歌翻译
缺乏外部知识使同志对话系统难以察觉隐含的情绪,并从有限的对话历史上学习情绪相互作用。为了解决上述问题,我们建议利用外部知识,包括致命知识和情绪词汇知识,以明确了解和表达在同情对话中的情绪。我们首先通过与外部知识共同互动并构建情感语境图来丰富对话史。然后,我们从知识丰富的情绪上下文图和蒸馏情绪信号中学习情绪背景陈述,这是在反应中表达的谓词情绪的先决条件。最后,为了产生同志反应,我们提出了一种情绪跨关注机制来从情绪上下文图中学习情绪依赖。在基准数据集上进行的广泛实验验证了该方法的有效性。此外,我们发现通过与正交工作的预先训练的模型集成,可以进一步提高我们的方法的性能。
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
This paper presents a practical global optimization algorithm for the K-center clustering problem, which aims to select K samples as the cluster centers to minimize the maximum within-cluster distance. This algorithm is based on a reduced-space branch and bound scheme and guarantees convergence to the global optimum in a finite number of steps by only branching on the regions of centers. To improve efficiency, we have designed a two-stage decomposable lower bound, the solution of which can be derived in a closed form. In addition, we also propose several acceleration techniques to narrow down the region of centers, including bounds tightening, sample reduction, and parallelization. Extensive studies on synthetic and real-world datasets have demonstrated that our algorithm can solve the K-center problems to global optimal within 4 hours for ten million samples in the serial mode and one billion samples in the parallel mode. Moreover, compared with the state-of-the-art heuristic methods, the global optimum obtained by our algorithm can averagely reduce the objective function by 25.8% on all the synthetic and real-world datasets.
translated by 谷歌翻译
Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.
translated by 谷歌翻译